Laryngectomee Speech Enhancement Using Voice Conversion Techniques

نویسنده

  • Arantza del Pozo
چکیده

People who suffer from larynx cancer are often laryngectomized, losing, as a consequence, their ability to produce normal speech. Despite advances in alaryngeal speech rehabilitation and restoration, laryngectomee speech is of poorer quality and intelligilibility than natural speech. This project explores the use of voice conversion as an alternative method to the enhancement of the quality and intelligibility of laryngectomee speech. An initial comparison of normal and laryngectomee glottal excitations confirms that the cause of their disorder is a voicing problem. Then, several experiments attempt to improve its perceptual quality. The first one checks the sanity of laryngectomee articulation and the rest replace its excitation with a better glottal source. We have found that residuals do not model normal and laryngectomee excitation differences and that glottal waveforms need to be mapped instead. However, the difficult task of estimating glottal waveforms conditions a straightforward continuous glottal mapping, deriving converted utterances with artefacts which need to be solved. Declaration I Arantza del Pozo of Christ's College, being a candidate for M.Phil in Computer Speech, Text and Internet Technology, hereby declare that this dissertation and the work described in it are my own work, unaided except as may be specified below, and that the dissertation does not contain material that has already been used to any substantial extent for a comparable purpose. The source code can be found under ~ad371/Project. Acknowledgements I would like to thank my supervisor Steve Young for his guidance and for always finding time to discuss my doubts. Many thanks to KK Hui Ye for providing me with the details and code for the analysis and synthesis of speech signals and for responding to my questions concerning its modification. Thanks to Antonia Kilcommons and Rachael Beddard for contacting the patients, organising the recording slots and providing facilities to build the laryngectomee data collection. Also special thanks to all the patients who took the time to record the utterances for the speech corpus. Fig. 12 Glottal excitation waveforms obtained for different normal male (a) and female Fig. 13 Glottal excitation waveforms obtained for the normal target speaker (a) and the different tracheoesophageal speakers from the data collection (b) .. Fig. 14 Glottal excitation spectrums of the normal target (a) and different tracheoesophageal speakers in the speech corpus (b).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Voice Conversion Techniques for Alaryngeal Speech Enhancement

This position paper gives a brief overview of our developed technologies for enhancing alaryngeal speech (AL speech) uttered by laryngectomees. There are several alternative speaking methods for laryngectomees to produce AL speech. However, any type of AL speech suffers from lack of naturalness and speaker individuality (identity). To address this issue, we have developed statistical voice conv...

متن کامل

An Evaluation through Simulation of Electrolarynx Control based on Statistical F0 Prediction for Multiple Speakers

An electrolarynx is a device that artificially generates excitation sounds to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce intelligible EL speech by using this device, it sounds quite unnatural due to the mechanical excitation. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that ...

متن کامل

Enhancement of Esophageal Speech Using Statistical Voice Conversion

This paper presents a novel method of enhancing esophageal speech based on statistical voice conversion. Esophageal speech is one of the speaking methods for total laryngectomees. Although it allows laryngectomees to speak by generating a sound source and articulating it to produce audible speech sounds using their esophagus and vocal organs, the generated voices sound unnatural. To improve the...

متن کامل

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction

An electrolarynx is a device that artificially generates excitation sounds to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce intelligible EL speech by using this device, it sounds quite unnatural due to the mechanical excitation. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that ...

متن کامل

Evaluation of Excitation Feature Prediction in a Hybrid Approach to Electrolaryngeal Speech Enhancement

We implement removing micro-prosody with low-pass filtering and avoiding Unvoiced/Voiced (U/V) prediction as part of a hybrid approach to improve statistical excitation prediction in the hybrid approach to electrolaryngeal (EL) speech enhancement. An electrolarynx is a device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient larynge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004